Multilingual bottle-neck features and its application for under-resourced languages

نویسندگان

  • Ngoc Thang Vu
  • Florian Metze
  • Tanja Schultz
چکیده

In this paper we present our latest investigation on multilingual bottle-neck (BN) features and its application to rapid language adaptation to new languages. We show that the overall performance of a Multilayer Perceptron (MLP) network improves significantly by initializing it with a multilingual MLP. Furthermore, ASR performance increases on both, on those languages which were used for multilingual MLP training, and on a new language. We propose a new strategy called “open target language” MLP to train more flexible models for language adaptation, which is particularly suited for small amounts of training data. The final results on the Vietnamese GlobalPhone database gave 15.8% relative improvement in terms of Syllable Error Rate (SyllER) for the ASR system trained with 22.5h data and 16.9% relative gains for the system trained with only 2h data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combination of multilingual and semi-supervised training for under-resourced languages

Multilingual training of neural networks for ASR is widely studied these days. It has been shown that languages with little training data can benefit largely from the multilingual resources for training. The use of unlabeled data for the neural network training in semi-supervised manner has also improved the ASR system performance. Here, the combination of both methods is presented. First, mult...

متن کامل

Multilingual Deep Bottle Neck Features a Study on Language Selection and Training Techniques

Previous work has shown that training the neural networks for bottle neck feature extraction in a multilingual way can lead to improvements in word error rate and average term weighted value in a telephone key word search task. In this work we conduct a systematic study on a) which multilingual training strategy to employ, b) the effect of language selection and amount of multilingual training ...

متن کامل

Investigating the learning effect of multilingual bottle-neck features for ASR

Deep neural networks (DNNs) have become state-of-the-art techniques of automatic speech recognition in the last few years. They can be used at the preprocessing level (Tandem or BottleNeck features) or at the acoustic model level (hybrid Hidden Markov Model/DNN). Moreover, they allow exploiting multilingual data to improve monolingual systems. This paper presents our investigation of the learni...

متن کامل

Lexicon+TX: rapid construction of a multilingual lexicon with under-resourced languages

Most efforts at automatically creating multilingual lexicons require input lexical resources with rich content (e.g. semantic networks, domain codes, semantic categories) or large corpora. Such material is often unavailable and difficult to construct for under-resourced languages. In some cases, particularly for some ethnic languages, even unannotated corpora are still in the process of collect...

متن کامل

SpeeD @ MediaEval 2014: Spoken Term Detection with Robust Multilingual Phone Recognition

In this paper, we attempt to resolve the Spoken Term Detection (STD) problem for under-resourced languages by phone recognition with a multilingual acoustic model of three languages (Albanian, English and Romanian). The Power Normalized Cepstral Coefficients (PNCC) features are used for improved robustness to noise.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012